This workshop covers the basics of using R as a GIS and gives a teaser on advanced spatial analyses. In review, GIS stands for Geographic Information Systems. Geographic Information Systems are, “a specialist information technology (IT) used for handling mapped data” (Burrough, McDonnell, & Llyod, 2015). Traditionally, these systems have been gooey interfaces that can accept spatial data in a variety of different formats. Increasingly, GIS related tasks are accomplished in a coding environment such as ‘R’ or ‘Python’. Typically, coding was done in a Python console within ESRI’s ArcGIS or, the open-source alternative, QGIS. Generally, the coding efforts have been limited.
Although, there are certain GIS tasks that are still best suited for a traditional GIS environment, there are equally as many that well suited to R or the Conda environment for Python. The control and reproducibility, offered in the coding environment, is leaps and bounds ahead of anything a traditional GIS can do. As capabilities around spatial data manipulation and analysis in R increase more people, GIS Specialists and Novices, are looking to use R as a GIS.
Before diving into the material, it is important to lay out two assumptions: 1. The presenter assumes all attendees have a basic working knowledge of ‘R’. 2. The presenter assumes that attendees have little to no knowledge of GIS and spatial data.
If you have no knowledge of ‘R’ you are of course still welcome! Just reach out to another attendee, one of the hosts, or the presenter for additional help as you need it! If you are an advanced GIS user, I hope there is still something you can take away from this presentation. Hopefully, you will be an asset to others in attendance today :).
So, let’s get started!
The below code installs and the loads the packages necessary for today’s workshop.
# Install packages
install.packages("sp") # Classes and methods for spatial data
install.packages("rgdal") # Access to projection/transformation operations
install.packages("rgeos") # Manipulation and querying of spatial geometries
install.packages("tidyverse") # For basic data manipulation
install.packages("raster") # Works with spatial data primarily raster but also vector
install.packages("tmap") # For mapping spatial data
install.packages("tmaptools") # Additional tools for the tmap package
install.packages("leaflet") # Open-source JavaScript libraries for interactive maps
install.packages("RColorBrewer") # Pre-packaged color pallettes
install.packages("spgwr") # geographically weighted regression
install.packages("jsonlite") # high performance tools for working with JSON
# Load packages
library(sp)
library(rgdal)
library(rgeos)
library(tidyverse)
library(raster)
library(tmap)
library(tmaptools)
library(leaflet)
library(RColorBrewer)
library(spgwr)
library(jsonlite)
When doing an analysis, spatial or not, we tend to have tabular data that we need to manipulate. In this case, we will be using the American Community Survey (ACS). ACS, “is an ongoing survey that provides vital information on a yearly basis about our nation and its people” (US Census Bureau, 2018). The data is open to the public and has a known spatial component. You can analyze it from an area as small as a census tract (generally 2,500 - 8,000 people) up through regional or national levels. To start, we are going to focus on the variable ‘median earnings in the last 12 months’.
Quick note: I have provided the data and shapefiles we will need for today. This is to expedite the workshop but usually you would need to access the ACS or Census data via an API. In ‘R’ this can be done, most easily, using the tidycensus package. More information on the tidycensus package is located here. You will need to request an API key for tidycensus to work. This can take some time, so I would not count on using the API during today’s workshop.
# Set working directory to a location that works best for you.
setwd("C:/Users/mary.lennon/Documents/RLadies_Spatial")
# Read data from the web
acs_data <- read.csv('./ACS_Data/acs_data.csv')
# Check that everything loaded
head(acs_data)
## GEOID NAME
## 1 4.2101e+10 Census Tract 1, Philadelphia County, Pennsylvania
## 2 4.2101e+10 Census Tract 2, Philadelphia County, Pennsylvania
## 3 4.2101e+10 Census Tract 3, Philadelphia County, Pennsylvania
## 4 4.2101e+10 Census Tract 4.01, Philadelphia County, Pennsylvania
## 5 4.2101e+10 Census Tract 4.02, Philadelphia County, Pennsylvania
## 6 4.2101e+10 Census Tract 5, Philadelphia County, Pennsylvania
## variable B08121_001_estimate B08121_001_moe
## 1 B08121_001 63650 9854
## 2 B08121_001 40472 17900
## 3 B08121_001 70300 9533
## 4 B08121_001 54250 7936
## 5 B08121_001 61542 12941
## 6 B08121_001 52625 6852
## Total_Worked_At_Home_estimate Total_Worked_At_Home_021_moe
## 1 106 62
## 2 8 13
## 3 147 63
## 4 162 82
## 5 82 58
## 6 36 21
## Total_Married_Couple_Family_estimate Total_Married_Couple_Family_moe
## 1 511 156
## 2 438 86
## 3 554 114
## 4 465 106
## 5 490 151
## 6 122 39
## Total_PopOneRace_White_estimate Total_PopOneRace_White_moe
## 1 3020 405
## 2 825 271
## 3 2740 323
## 4 1605 210
## 5 2721 417
## 6 1126 231
Spatial data comes in many shapes and sizes: GPX, Shapefile, GeoJSON, and more. The most common format is the shapefile. A shapefile stores both the, “geometric location and attribute information of geographic features” (ESRI). Vector images are made of individual points that have x/y (latitude/longitude) coordinates. These points can be joined to create a line or a polygon. The spatial data that we are most likely to come across in our work is vector data, so that is all we will use today. Differing from vector data is raster data. Raster’s are made up of pixels or cells that have an associated value with them. “Raster’s contain additional information describing where the edges of the image are located in the real world, together with how big each cell is on the ground. This means that your GIS can position your raster images correctly relative to one another, and this allows you to build up your map” (Stack Exchange).
Raster v. Vector (Mullen, 2015)
During this workshop we will load point, line, and polygon shapefiles into ‘R’. First, we are loading our polygons - census tracts that we can pair with our ACS survey results???.
Please Note: You have download several shapefiles for this workshop and each is contained within it’s own sub-folder within our working directory. If you open those folders you will see several files with the same name but different file extensions.
.shp - Contains the lat/long information for your shapefile .dbf - The tabular attribute data associated with the geographic location .prj - The projection of the shapefile, see CRS explained later in this workshop. .shx - Spatial indexing
There are possibly more but these are the only ones that are required for a shapefile. We will dive into this in the below code.
# Read in the shapefile
census_boundaries <-
rgdal::readOGR("./Census_Boundaries",
layer = "cb_2017_42_tract_500k")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\mary.lennon\Documents\RLadies_Spatial\Census_Boundaries", layer: "cb_2017_42_tract_500k"
## with 3217 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
# Is it true?!
# Did we just load spatial data?!
# I'm skeptical.
plot(census_boundaries)
# But look there it is!!!!!!!
# Now, lets look at the structure of the spatial data in R. This is going to be different
# than what you are used to.
# Lets first run a test to see what happens if we try to look at the head of the data we normally would.
# Remember, using the head() function allows you to see the top 6 observations of the data. This is a good
# idea as it allows to inspect the data we are working with.
head(census_boundaries)
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 0 42 003 020100 1400000US42003020100 42003020100 201 CT
## 1 42 003 051100 1400000US42003051100 42003051100 511 CT
## 2 42 003 070800 1400000US42003070800 42003070800 708 CT
## 3 42 003 080700 1400000US42003080700 42003080700 807 CT
## 4 42 003 130100 1400000US42003130100 42003130100 1301 CT
## 5 42 003 160800 1400000US42003160800 42003160800 1608 CT
## ALAND AWATER
## 0 1678102 483177
## 1 190118 0
## 2 446679 0
## 3 275058 0
## 4 772138 0
## 5 1029321 0
Wow, that was a lot of information and it’s not formatted the way I am used to.
What you just saw printed out all at once, were a bunch of different “slots” for the spatial data. The slot we are most interested in is the ‘@data’ slot as it contains the attribute information for the shapefile.
head(census_boundaries@data)
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 0 42 003 020100 1400000US42003020100 42003020100 201 CT
## 1 42 003 051100 1400000US42003051100 42003051100 511 CT
## 2 42 003 070800 1400000US42003070800 42003070800 708 CT
## 3 42 003 080700 1400000US42003080700 42003080700 807 CT
## 4 42 003 130100 1400000US42003130100 42003130100 1301 CT
## 5 42 003 160800 1400000US42003160800 42003160800 1608 CT
## ALAND AWATER
## 0 1678102 483177
## 1 190118 0
## 2 446679 0
## 3 275058 0
## 4 772138 0
## 5 1029321 0
These results look much more familiar. While accessing the data slot you probably noticed other slots (polygons, plotOrder, bbox, and proj4string). We aren’t going to dig into these much but will explore proj4string later.
Looking at this tabular data we can’t tell a whole lot. STATEFP, COUNTYFP…. may not mean much to anyone who has never worked with census data before. In summary, they are coded identifiers for the different spatial polygons that can be used to link the polygons to the ACS data.
This process is very similar to joining non-spatial tables together; all the same rules of a join apply. We need a matching identifier between the two datasets that we can use as our join field. Instead of inner, left, or right join we use merge. For anyone who has worked in a traditional GIS program such as ArcGIS or QGIS this is analogous to the merge you use in those environments.
# The spatial object needs to be listed first in the merge function for the result to be a spatial object.
acs_spatial <-
sp::merge(x = census_boundaries, y = acs_data, by = "GEOID")
# How would you inspect the results?
Now that we have inspected the data we can see that we have many ‘NA’ values. This is because we have more spatial data, more polygons, than we do attribute information in the table we joined. The ACS data I provided only has information for the census tracts in Philadelphia whereas the polygon shapefile contains all census tracts in Pennsylvania. To resolve this we need to subset the data. We can do this one of two ways, by attribute or by spatial selection. I am going to show you both.
By attribute: To subset the data to Philadelphia County only, which contains the Philadelphia census tracts, we need to subset the data to only include observations where the ‘COUNTYFP’ is 101. This is Philadelphia’s federally recognized code.
For other codes please visit this website.
# Subset the data
acs_philly <- subset(x = acs_spatial, subset = COUNTYFP == 101)
# Check the results
head(acs_philly@data)
## GEOID STATEFP COUNTYFP TRACTCE AFFGEOID NAME.x LSAD
## 2411 42101000200 42 101 000200 1400000US42101000200 2 CT
## 2415 42101000500 42 101 000500 1400000US42101000500 5 CT
## 2421 42101000901 42 101 000901 1400000US42101000901 9.01 CT
## 2424 42101001002 42 101 001002 1400000US42101001002 10.02 CT
## 2436 42101002000 42 101 002000 1400000US42101002000 20 CT
## 2456 42101003901 42 101 003901 1400000US42101003901 39.01 CT
## ALAND AWATER NAME.y
## 2411 382481 0 Census Tract 2, Philadelphia County, Pennsylvania
## 2415 428780 0 Census Tract 5, Philadelphia County, Pennsylvania
## 2421 105511 0 Census Tract 9.01, Philadelphia County, Pennsylvania
## 2424 471655 0 Census Tract 10.02, Philadelphia County, Pennsylvania
## 2436 289368 0 Census Tract 20, Philadelphia County, Pennsylvania
## 2456 420397 0 Census Tract 39.01, Philadelphia County, Pennsylvania
## variable B08121_001_estimate B08121_001_moe
## 2411 B08121_001 40472 17900
## 2415 B08121_001 52625 6852
## 2421 B08121_001 42883 3233
## 2424 B08121_001 68472 11248
## 2436 B08121_001 30757 2364
## 2456 B08121_001 31068 7354
## Total_Worked_At_Home_estimate Total_Worked_At_Home_021_moe
## 2411 8 13
## 2415 36 21
## 2421 57 56
## 2424 169 70
## 2436 31 30
## 2456 326 363
## Total_Married_Couple_Family_estimate Total_Married_Couple_Family_moe
## 2411 438 86
## 2415 122 39
## 2421 133 56
## 2424 690 88
## 2436 125 42
## 2456 868 178
## Total_PopOneRace_White_estimate Total_PopOneRace_White_moe
## 2411 825 271
## 2415 1126 231
## 2421 1521 223
## 2424 3144 240
## 2436 494 167
## 2456 4112 830
unique(acs_philly@data$COUNTYFP)
## [1] 101
## 67 Levels: 001 003 005 007 009 011 013 015 017 019 021 023 025 027 ... 133
# Quick plot to look at the data
plot(acs_philly)
Spatial selection:
Although we have a field that we can use to subset our data there may be future instances where this is not the case. In these instances we may want to try a spatial selection. This is where we overlap two shapefiles, or other spatial data format, and select attributes based on overlapping spatial geometries. The example here will be that we want all census tract polygons that fall within the boundaries of shapefile for Philadelphia County.
# I have provided you with a second shapefile, an outline for
# Philadelphia county. Lets load that data here.
# Load Philadelphia County shapefile
Phila_County <-
readOGR(dsn = "./Philadelphia_County_Boundary",
layer = "Philadelphia_County_NAD83")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\mary.lennon\Documents\RLadies_Spatial\Philadelphia_County_Boundary", layer: "Philadelphia_County_NAD83"
## with 1 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
# Spatial selection
# spgeom1 is the layer we want to use to do our selection
# spgeom2 is the layer we want our selection from
acs_philly <-
raster::intersect(Phila_County,
acs_spatial)
## Warning in raster::intersect(Phila_County, acs_spatial): non identical CRS
## Warning in raster::intersect(Phila_County, acs_spatial): polygons do not
## intersect
Wait…this doesn’t work!
Mary…you said we could do this!!
This is a very important error and learning point. Let’s look at that error a little more closely…
“spgeom1 and spgeom2 have different proj4 strings”
This is because our coordinate reference systems (CRS) differ.
AKA Re-projecting
Coordinate reference systems, also known as projections, are really important when working with spatial data. “The Coordinate Reference System (CRS) of spatial objects defines where they are placed on the Earth’s surface” (Lovelace, Cheshire, & Oldroyd, 2017). We need CRS because mapping 2D coordinates on the 3D earth requires some math. There are many different projections because there are many different ways to map from 2D to 3D. Some account for the curvature of the earth’s surface (geographic) and others do not (projected). Many have datum’s that are specific to a country. These factors and more cause a different representation of the spatial data. Working with data in two different reference systems, in one project, causes a compatibility issue. Often, we cannot control the reference system of the data we receive but we can re-project the data to be in the same system.
There is a lot more that could be said on this topic, if you are interested. Here is a primer provided by ESRI (a large supplier of GIS technology) that does a good job.
To find a reference list of coordinate systems go to espi.io.
# Let's review what I taught you about slots with the spatial data.
# There was the one slot I said we would need to remember that
# is important: proj4string. If we look at that we can see what the
# CRS is for the spatial data.
# The ACS spatial data
acs_spatial@proj4string
## CRS arguments:
## +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0
# The Philadelphia county boundary
Phila_County@proj4string
## CRS arguments:
## +proj=lcc +lat_1=40.96666666666667 +lat_2=39.93333333333333
## +lat_0=39.33333333333334 +lon_0=-77.75 +x_0=600000.0000000001
## +y_0=0 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80
## +towgs84=0,0,0
# We can also use this function from the sp package
proj4string(Phila_County)
## [1] "+proj=lcc +lat_1=40.96666666666667 +lat_2=39.93333333333333 +lat_0=39.33333333333334 +lon_0=-77.75 +x_0=600000.0000000001 +y_0=0 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0"
# We aren't going to worry about what all of the bits of these strings
# mean. That is beyond the scope of this workshop. We are going to note
# that they are clearly different. So...we need to re-project the data
# to be in the same CRS. I am choosing World Geodetic System 84 as it
# is commonly used.
# It's good practice to set the CRS as a variable so you can easily
# refer to it later. We will do so here.
WGS84 <- "+init=epsg:4326"
# Re-project the data using the the sp package
Phila_County <- sp::spTransform(x = Phila_County, CRSobj = WGS84)
acs_spatial <- sp::spTransform(x = acs_spatial, CRSobj = WGS84)
## Please Note: Transforming/ Re-projecting the data is more than
# just changing the CRS. Please ALWAYS make sure you are transforming/
# re-projecting the data and not just changing the name of the CRS.
# Check the CRS
proj4string(Phila_County)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
proj4string(acs_spatial)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# Yay!! Everything is in the same CRS!!! Now, let's try that spatial
# selecion again.
acs_philly <-
raster::intersect(Phila_County,
acs_spatial)
# Lets check that everything looks good with the data.
# First good sign is that we have the correct number of observations in the table (see our Environment).
# There are 384 observations in both acs_philly and acs_data. We know that acs_data, the attribute
# data we originally started with had the correct number of observations so this is a good sign.
# Lets check the attributes of the data
head(acs_philly@data)
## STATEFP.1 COUNTYFP.1 COUNTYNS AFFGEOID.1 GEOID.1 NAME LSAD.1
## 1 42 101 01209187 0500000US42101 42101 Philadelphia 06
## 2 42 101 01209187 0500000US42101 42101 Philadelphia 06
## 3 42 101 01209187 0500000US42101 42101 Philadelphia 06
## 4 42 101 01209187 0500000US42101 42101 Philadelphia 06
## 5 42 101 01209187 0500000US42101 42101 Philadelphia 06
## 6 42 101 01209187 0500000US42101 42101 Philadelphia 06
## ALAND.1 AWATER.1 GEOID.2 STATEFP.2 COUNTYFP.2 TRACTCE
## 1 347520038 22086063 42101000200 42 101 000200
## 2 347520038 22086063 42101000500 42 101 000500
## 3 347520038 22086063 42101000901 42 101 000901
## 4 347520038 22086063 42101001002 42 101 001002
## 5 347520038 22086063 42101002000 42 101 002000
## 6 347520038 22086063 42101003901 42 101 003901
## AFFGEOID.2 NAME.x LSAD.2 ALAND.2 AWATER.2
## 1 1400000US42101000200 2 CT 382481 0
## 2 1400000US42101000500 5 CT 428780 0
## 3 1400000US42101000901 9.01 CT 105511 0
## 4 1400000US42101001002 10.02 CT 471655 0
## 5 1400000US42101002000 20 CT 289368 0
## 6 1400000US42101003901 39.01 CT 420397 0
## NAME.y variable
## 1 Census Tract 2, Philadelphia County, Pennsylvania B08121_001
## 2 Census Tract 5, Philadelphia County, Pennsylvania B08121_001
## 3 Census Tract 9.01, Philadelphia County, Pennsylvania B08121_001
## 4 Census Tract 10.02, Philadelphia County, Pennsylvania B08121_001
## 5 Census Tract 20, Philadelphia County, Pennsylvania B08121_001
## 6 Census Tract 39.01, Philadelphia County, Pennsylvania B08121_001
## B08121_001_estimate B08121_001_moe Total_Worked_At_Home_estimate
## 1 40472 17900 8
## 2 52625 6852 36
## 3 42883 3233 57
## 4 68472 11248 169
## 5 30757 2364 31
## 6 31068 7354 326
## Total_Worked_At_Home_021_moe Total_Married_Couple_Family_estimate
## 1 13 438
## 2 21 122
## 3 56 133
## 4 70 690
## 5 30 125
## 6 363 868
## Total_Married_Couple_Family_moe Total_PopOneRace_White_estimate
## 1 86 825
## 2 39 1126
## 3 56 1521
## 4 88 3144
## 5 42 494
## 6 178 4112
## Total_PopOneRace_White_moe
## 1 271
## 2 231
## 3 223
## 4 240
## 5 167
## 6 830
# Also, a quick plot to make sure everything looks correct
plot(acs_philly)
As you may have noticed, there are a lot of other variables in our dataset that we may or may not need. Further, not all of them are named usefully. Before we map this data we are going to clean up our dataset a little bit and add a new variable to aid in our mapping. We are going to keep it relatively simple and create a classification of the variable B08121_001_estimate which is actually the median earnings in the last 12 months.
Please Note: Each ACS variable comes with an estimate and a MOE. ACS estimates are based on a sample of the population. The MOE is the margin of error around that number which is the possible variation around that value. We will only be concerning ourselves with the estimate for this workshop although I have provided the MOE for you.
For more on the MOE.
# First we are going to cut down the number of variables we have in the dataset
# We are going to accomplish this with the subset function from the tidyverse.
# This time there is a select function within the subset function to allow
# us to choose the variables we want to keep.
# For the purposes of this excercise we are going
# to limit our efforts to State (STATEFP.1), County (COUNTYFP.1),
# Census Tract (TRACTCE), Tract Name (NAME.y), and the varying estimates (_estimate),
# and MOE's for the dataset (_moe).
acs_philly <- subset(
x = acs_philly,
select = c(
"STATEFP.1",
"COUNTYFP.1",
"TRACTCE",
"NAME.y",
"B08121_001_estimate",
"B08121_001_moe",
"Total_Worked_At_Home_estimate",
"Total_Worked_At_Home_021_moe",
"Total_Married_Couple_Family_estimate",
"Total_Married_Couple_Family_moe",
"Total_PopOneRace_White_estimate",
"Total_PopOneRace_White_moe"
)
)
# New variable names
var_names <-
c("State",
"County",
"Census_Tract",
"Tract_Name",
"MedInc_Est",
"MedInc_Moe",
"Total_Worked_At_Home_estimate",
"Total_Worked_At_Home_021_moe",
"Total_Married_Couple_Family_estimate",
"Total_Married_Couple_Family_moe",
"Total_PopOneRace_White_estimate",
"Total_PopOneRace_White_moe"
)
# Rename the variables
names(acs_philly@data) <- var_names
# Create a new variable that bands the median income by quartiles.
acs_philly@data$MedIncQuart <-
ntile(x = acs_philly@data$MedInc_Est, n = 4)
We have made it so far! These are a lot of the basics of spatial data… now on to, arguably, the most fun part… mapping!!!!!!!!!!!!
We are getting fancy now!
We have already created basic maps, using plot(), that show the basic outline of the spatial data. We are now going to create a more complex choropleth map. A choropleth map shows geographic areas shaded or colored differently based on the variable being depicted. There are several packages we can use to map data. We are going to stick to tmap in this workshop. I, personally, tend to find that ggplot/ggmap is not the most intuitive. There are benefits to working with ggplot for mapping, so I encourage exploring it outside of tutorial.
# For a quick single topic map we can use tmap's qtm function.
tmap::qtm(acs_philly, fill = "MedIncQuart")
## Warning: package 'sf' was built under R version 3.5.1
## Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
# We can continue to modify the map using other tmap functions.
# To use these other functions, we first load the data in with
# the tm_shape function. We then can build upon this with tm_fill,
# tm_borders, tm_compass, tm_layout, etc.
# Simple example of tm_shape and tm_fill
tm_shape(acs_philly) + tm_fill("MedIncQuart")
# There are a variety of different color pallets that we can choose.
# A popular option to choose one is RColor Brewer. Maps are often
# used to tell a story. As with any number of statistics, there
# are a variety of ways to tell a story with your data.
# The story can be changed through different presentations of the
# numbers either through graphs or visualizations such as maps.
# One way to affect the story that numbers tell, or to set a tone
# with your visualizeion, is through color.
# ColorBrewer is a common go-to color palette choices
# To see all possible ColorBrewer options use the following function:
display.brewer.all()
# For more information on ColorBrewer: http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3
# A map using a ColorBrewer pallette:
tm_shape(acs_philly) + tm_fill("MedIncQuart", palette = "Blues")
tm_shape(acs_philly) + tm_fill("MedIncQuart", palette = "-Blues") # Reversed
# As mentioned above, the way numbers are presented affects
# the message of a graphic. We can show our data different ways
# easily with the tmap package. We should always take care in
# how we visualize our data as we can distort our message.
# Take a look at these different types of breaks in the
# median income attribute. How do they change the
# message the graphic is telling? For more on this topic
# https://blog.cartographica.com/blog/2010/8/16/gis-data-classifications-in-cartographica.html
# Quantile
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "quantile",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
# + tm_compass() # If you want a north arrow
# Pretty
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "pretty",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
# Equal
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "equal",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
# Jenks
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "jenks",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
# We can layer differ shapefiles in one map. There are many reasons we
# could want to do this. For demonstration purposes, we are going to
# layer SEPTA Regional Rail Lines and Stations.
# Read in the Philadelphia Regional Rail shapefile
SEPTA_RR <-
readOGR(dsn = "C:/Users/mary.lennon/Desktop/RLadies_Presentation/SEPTA_RegionalRail",
layer = "SEPTAGISRegionalRailLines_201207")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\mary.lennon\Desktop\RLadies_Presentation\SEPTA_RegionalRail", layer: "SEPTAGISRegionalRailLines_201207"
## with 13 features
## It has 12 fields
## Integer64 fields read as strings: OBJECTID Id
# SEPTA Stations shapefile
SEPTA_Staions <-
readOGR(dsn = "C:/Users/mary.lennon/Desktop/RLadies_Presentation/SEPTA_RegionalRail",
layer = "SEPTAGISRegionalRailStations_2016")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\mary.lennon\Desktop\RLadies_Presentation\SEPTA_RegionalRail", layer: "SEPTAGISRegionalRailStations_2016"
## with 155 features
## It has 12 fields
# Make sure it is in the correct CRs
SEPTA_RR <- sp::spTransform(x = SEPTA_RR, CRSobj = WGS84)
SEPTA_Staions <- sp::spTransform(x = SEPTA_Staions, CRSobj = WGS84)
# Plot
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "jenks",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_shape(SEPTA_RR) +
tm_lines(col = "black", scale = 1, alpha = 0.25) +
tm_shape(SEPTA_Staions) +
tm_dots(
col = "black",
scale = 2.5,
alpha = 0.7,
shape = 16
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
# Change the style
# See the options listed on your console when you run this code.
# Play around with them!
tmap_style("classic") +
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "jenks",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_shape(SEPTA_RR) +
tm_lines(col = "black", scale = 1, alpha = 0.25) +
tm_shape(SEPTA_Staions) +
tm_dots(
col = "black",
scale = 2.5,
alpha = 0.7,
shape = 16
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
## tmap style set to "classic"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "watercolor"
# Add a basemap
# I have not figured out a way to have a static basemap with tmap.
# However, we can easily get interactive maps where
# we can change the basemap as we please!
# To work with interactive maps we need to switch the mode (tmap_mode)
# to view. The other option is tmap_mode("Plot") which disengages the
# interactive quality of the maps. You can switch
# between the two modes by including ttm() in your code!
tmap_mode("view") # View for interactive
## tmap mode set to interactive viewing
tm_basemap(providers$OpenStreetMap.BlackAndWhite) +
tm_shape(acs_philly) + tm_fill(
"MedInc_Est",
style = "jenks",
n = 5,
palette = "Reds",
legend.hist = TRUE
) +
tm_shape(SEPTA_RR) +
tm_lines(col = "black", scale = 1, alpha = 0.25) +
tm_shape(SEPTA_Staions) +
tm_dots(
col = "black",
scale = 1.5,
alpha = 0.7,
shape = 16
) +
tm_layout(
"ACS 2017 - Median Income",
legend.title.size = 1,
legend.text.size = 0.6,
legend.width = 1.0,
legend.outside = TRUE,
legend.bg.color = "white",
legend.bg.alpha = 0,
frame = FALSE
)
# Try different types of basemaps here: http://leaflet-extras.github.io/leaflet-providers/preview/
# Switch back to plotting mode - remove the hash on the below code
# if you want to try this out!
#ttm()
I encourage you to keep playing with all the options presented here and more! There is functionality for facet wrap with the tm_facets function if you get a chance.
A slippy map is a web map where you can pan and zoom around. This exercise gives you an opportunity to map slippy maps with leaflet outside of tmap. Further, an opportunity to work with an API! API stands for Application Programming Interface. An API, such as this one for meetup, allows for remote severs to continuously and easily (on the users end) connect and exchange data (with no need to save anything on our servers). Using the Meetup API we are going to look at all meetups within a 10 mile radius of your address. As awesome as this idea is, I have to give credit to my lovely friend Ellen Talbot who runs R-Ladies Liverpool in the UK (@RLadiesLivUK)! The example presented here is using the WeWork location of today’s meetup instead of a home address.
Go to the following link to get your Meetup API signed URL: (https://secure.meetup.com/meetup_api/console/?path=/recommended/groups)
Fill out the following fields:
Zip - Your postcode or one for WeWork 19148 Country - United States Radius - 10 Omit - description
For more information on the fields in the API request click here
#Replace this signed URL with your own!
meetup <-
as.data.frame(
fromJSON(
"https://api.meetup.com/recommended/groups?photo-host=public&zip=19148&country=United+States&sig_id=206834555&radius=20&sig=dc1629cf2ba56c58e71f555c31aa89562d9a5348"
)
)
# Bind together the coordinates
coords <- cbind(meetup$lat, meetup$lon)
# # Turn these coords into points
sp <- SpatialPoints(coords)
# Join the points to the original data
spdf <- SpatialPointsDataFrame(coords, meetup)
# Add to a leaflet map and viola!
leaflet(data = spdf) %>% addTiles() %>%
addMarkers(
~ lon,
~ lat,
popup = ~ as.character(name),
label = ~ as.character(name)
)
You can play around with other settings for this leaflet map by following some of the instruction here.
Often when working with spatial data we need to create our own shapefiles. This could be everything from drawing a polygon to creating points from lat/long data. Additionally, there are many instances where it is useful to buffer data (especially points) to do other types of spatial selections. In this bonus material we are going to create a point shapefile from lat/long coorindates and buffer those points by 500 meters. We will use the Meetup API for this exercise as it contains lat/long and is likely already loaded into your environment. The Meetup API is introduced in Bonus #1. If you skipped that activity, please complete the first bit of it now to retrieve the Meetup data.
# Grab the coordinates from the Meetup data
xy <- meetup[, c("lon", "lat")]
# Turn it into a spatial points dataframe
spdf <- SpatialPointsDataFrame(
coords = xy,
data = meetup,
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
)
# Quick check of the data
plot(spdf)
# We can buffer the data with the buffer function from the raster package.
# Buffering is done in the units of the CRS. For those in WGS this
# would mean that the data is shown in degrees. For others like NAD83
# the units would be in feet or meters. I will re-project the data here to NAD83 UTM 18N Pennsylvania
# South Meters. This will give us manageable units to work with and introduces you to a common
# coordinate system for the Philadelphia and surrounding area
# Re-project the data
spdf <- sp::spTransform(x = spdf, CRSobj = "+init=epsg:26918")
# Buffer the data
meetup_buffer <- raster::buffer(spdf, width = 500) # The unit is now meters
# View the results
tmap::tm_shape(meetup_buffer) + tmap::tm_borders(col = "blue")
Further work: As we learned how to do spatial selections earlier in the workshop. See if you can select points by polygons. If you are feeling ambitious, see if you can aggregate the data to the polygon layer. Then feel free to create maps or do any of the other analyses we covered today!
If you want to continue to explore other spatial data, either during this workshop or afterward, these are some great places to go!
PASDA - Open GIS Data Access for Pennsylvania. Includes a variety of different types of data both raster and vector ranging from centerlines to roads to flood depth grids.
Open Data Philly - A catalog of all the open data in the Philadelphia region (some of which is spatial). The repository covers topics from arts and culture to politics and real-estate.
National Map Viewer - The data download for the National Map Viewer, maintained by the United Staes Geological Survey primarily has land cover and elevation data. This is a good place to get a raster to play with.
Open Government - Open data repository for the US government covering everything from agriculture to maritime and finance.
Burrough, P. A., McDonnell, R. A., & Llyod, C. D. (2015). Principals of Geographic Information Sytsems. New York: Oxford Univeristy Press.
ESRI. (n.d.). What is a shapefile? Retrieved 28 11, 2018, from ArcGIS for Desktop: http://desktop.arcgis.com/en/arcmap/10.3/manage-data/shapefiles/what-is-a-shapefile.htm
Lovelace, R., Cheshire, J., & Oldroyd, R. (2017). Introduction to Visualizing Spatial Data in R.
Openshaw, S., 1983. The Modifiable Areal Unit Problem. Concepts and Techniques in Modern Geography, Volume 38, pp. 1-41.
Openshaw, S., 1984. Ecological fallacies and the analysis of areal census data. Enviroment and Planning A, Volume 16, pp. 17-31.
US Census Bureau. (2018, June 17). About the American Community Survey. Retrieved November 28, 2018, from Census.gov: https://www.census.gov/programs-surveys/acs/about.html